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Abstract 

This  paper  develops  a  theory  of  interoperability  failures.  Interoperability  in  this  paper  refers  to 
the  exchange  of  information  and  the  use  of  information,  once  exchanged,  between  two  or  more 
systems.  The  need  for  a  theory  of  interoperability  failures  is  introduced  along  with  a  discussion 
of  the  reinforcing  relationship  between  theory  and  experiment.  First,  the  interoperability  of  two 
systems  over  time  is  considered.  The  failure  rate  for  electronic  equipment  as  it  ages  over  time 
often  follows  a  life  distribution  model  in  the  shape  of  the  widely  known  “Bathtub”  curve.  By 
analogy,  if  one  considers  the  interaction  of  two  systems  over  time,  a  theory  of  interoperability 
failures  can  be  developed  by  postulating  a  life  distribution  model  with  three  distinct  time  periods: 
early,  mediate,  and  relative  obsolescence.  A  causal  analysis  that  focuses  on  intended 
functionality,  requirements,  design  implementation,  and  developmental  testing  is  used  to  explain 
the  existence  of  these  three  time  periods.  Then,  the  relationship  between  interoperability  and 
complexity  in  terms  of  interaction  and  coupling  is  discussed.  Finally,  the  theory  is  used  to 
develop  criteria  for  selecting  specific  systems  to  study  and  collect  data  to  refute  or  lend  credence 
to  the  theory. 

1.  Introduction 

Achieving  interoperability  among  Command,  Control,  Communications,  Computers, 
Intelligence,  Surveillance,  and  Reconnaissance  (C4ISR)  systems  continues  to  be  a  challenge  for 
the  U.S.  Department  of  Defense  (DoD).  Progress  has  been  made  in  recent  years  through  the  use 
of  directives,  efforts  to  educate  and  train  practitioners,  an  increased  emphasis  on  capability  over 
platforms,  and  the  increased  use  of  integrated  architectures  and  mission  capability  packages. 

One  element  missing  from  this  mix  is  a  coherent,  verifiable  theory  of  interoperability  failures 
that  captures  the  causes  of  interoperability  faults  in  a  form  that  practitioners  can  use  to  avoid 
these  pitfalls  in  their  own  work. 


The  author  would  like  to  acknowledge  the  JMACA  team  and  the  Director,  Operational  Test  &  Evaluation  (DOT&E)  Joint  Test 
&  Evaluation  (JT&E)  Program  office.  The  contents  of  this  paper  reflect  the  author’s  own  personal  views  and  conclusions,  based 
on  independent  research  and  analysis.  They  do  not  necessarily  reflect  official  current  policy  in  any  agency  of  the  U.S. 
Government. 
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The  purpose  of  this  paper  is  to  develop  a  theory  of  interoperability  failures  that  can  be  confirmed 
by  objective  evidence. 

The  goal  of  developing  a  theory  of  interoperability  failures  is  to  be  able  to  efficiently  collect  the 
data  required  to  create  and  validate  prediction  rules  that  can  be  used  to  make  diagnostic  decisions 
about  conducting  end-to-end  interoperability  testing  of  C4ISR  equipment  strings.  [McBeth, 
2000] 

Interoperability  is  an  active  area  of  research.  The  roots  of  the  theory  developed  in  this  paper  can 
be  traced  to  previous  work  including  a  paper  by  Sutton  where  an  analogy  is  drawn  between 
interoperability  and  electronic  equipment  reliability  and  papers  by  Hamilton,  Melear,  and 
Endicott  where  interoperability  is  dealt  with  using  an  engineering  life  cycle  model  [Sutton,  1999] 
[Hamilton  et  al.,  2002a]  [Hamilton  et  al.,  2002b], 

In  section  2,  several  definitions  of  interoperability  are  discussed.  The  relationship  between  the 
definition  one  adopts  for  interoperability  and  the  problem  to  be  solved  is  examined.  A  working 
definition  of  an  interoperability  failure  is  introduced.  Section  3  addresses  the  question  of  why  a 
theory  of  interoperability  failures  is  needed.  Section  4  extends  Sutton’s  electronic  equipment 
reliability  analogy  to  gain  insights  into  the  interoperability  interaction  between  two  systems  over 
time.  This  analysis  suggests  a  notional  life  distribution  model  with  three  regions  where  specific 
interoperability  mechanisms  tend  to  dominate.  These  three  regions  are  called  the  early,  mediate, 
and  relative  obsolescence  failure  periods.  Section  5  discusses  the  relationship  of  interoperability 
and  complexity  in  terms  of  system  interaction  and  coupling.  These  ideas  will  be  used  to  help 
guide  the  selection  of  systems  for  study  within  each  of  these  regions.  Sections  6,  7,  and  8 
discuss  the  early,  mediate,  and  relative  obsolescence  failure  periods,  respectively.  In  each  of 
these  three  sections,  causal  relationships  are  proposed  for  each  failure  mechanism  and  a 
hypothesis  is  generated  to  explain  the  nature  of  the  failure  mechanism  expected  to  dominate. 
Also,  system  selection  criteria  are  proposed  to  look  for  evidence  to  refute  or  lend  credence  to  the 
theory.  Section  9  briefly  introduces  some  ideas  for  creating  a  prediction  rule  based  on  the 
theory.  Section  10  describes  the  next  steps  in  motivating  and  verifying  the  theory.  Section  11 
provides  a  brief  summary  of  the  paper. 

2.  Interoperability  Definitions 

The  standard  DoD  definition  of  interoperability  is: 

“(1)  The  ability  of  the  systems,  units,  or  forces  to  provide  services  to  and  accept 
services  from  other  systems,  units,  or  forces,  and  to  use  the  services  so  exchanged 
to  enable  them  to  operate  effectively  together,  and  (2)  the  condition  achieved 
among  communications-electronics  systems  or  items  of  communications- 
electronics  equipment  when  information  or  services  can  be  exchanged  directly 
and  satisfactorily  between  them  or  their  users.  The  degree  of  interoperability 
should  be  defined  when  referring  to  specific  cases.”  [CJCS,  2000] 

This  definition  is  all-encompassing  and  stated  at  a  high  level  of  abstraction  to  cover  a  wide 
variety  of  situations.  The  definition  was  modified  from  the  previous  release  of  the  instruction  to 
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acknowledge  that  the  “degree  of  interoperability”  is  case  dependent  and  “for  the  purposes  of  this 
instruction  ...  will  be  determined  by  the  accomplishment  of  the  proposed  Information  Exchange 
Requirement  (IER)  fields.”  [CJCS,  2000]  The  notion  of  an  IER  implies  an  “end-to-end”  thread 
or  string  that  allows  users  to  exchange  information  through  a  C4ISR  architecture.  Sutton 
recognized  the  need  to  focus  on  “end-to-end”  interoperability  and  provided  the  following 
definitions: 

end-to-end  interoperability  -  “The  probability  of  successful  interoperation  of  all 

subscribers  in  a  network  under  specified  conditions  for  a 
given  mission  time.”  [Sutton,  1999] 

interoperability  failure  -  “The  inability  of  the  network  to  meet  specified 

interoperability  levels,  conditions,  and  requirements,  such 
as  minimum  acceptable  data  transfer  rate,  quality  of 
service,  and  maximum  allowable  latency.”  [Sutton,  1999] 

Although  these  definitions  may  work  at  a  network  level,  they  fall  short  for  defining 
interoperability  at  the  end-to-end  thread  or  string  level.  These  definitions  allow  problems 
including  faulty  network  design,  improper  traffic  management,  and  intrinsic  hardware  failures  to 
serve  as  potential  causes  for  interoperability  failures.  While  one  can  argue  that  these  are  serious 
problems,  they  are  unlikely  to  be  detected  in  traditional  lab-based  end-to-end  testing  since  these 
problems  are  not  directly  related  to  the  interactions  between  two  or  more  systems. 

For  example,  although  intrinsic  hardware  failures  in  a  system  can  render  a  specific  instance  of  an 
equipment  string  and  its  corresponding  functional  thread  inoperative,  it  is  not  a  sign  that  the 
equipment  string,  per  se,  is  not  interoperable.  This  may  seem  like  a  fine  distinction,  but  it  is  a 
critical  one  if  the  definition  of  interoperability  is  to  lead  to  prediction  rules  useful  for  selecting 
equipment  strings  for  end-to-end  testing.  Including  causal  indicators  for  intrinsic  hardware 
failures  in  a  prediction  rule  could  result  in  a  diagnostic  protocol  with  an  unacceptably  high  (and 
costly)  false  alarm  rate.  Therefore,  traditional  Mean-Time -Between-Failure  (MTBF)-type 
failures,  which  are  unlikely  to  be  detected  during  end-to-end  testing,  should  be  excluded.  These 
failures  are  more  of  a  reliability  issue  that  can  be  better  addressed  through  well  known  design 
measures  and  reliability  growth  testing.  [Fuqua,  1987] 

With  the  dual  principles  of  system  interaction  and  end-to-end  testing  in  mind,  definitions  for  an 
equipment  string  and  a  functional  thread  are  presented.  These  definitions  will  set  the  stage  for 
working  definitions  for  interoperability,  interoperability  failure,  and  interoperability  fault. 


equipment  string  -  a  serial  sequence  of  N  systems  connected  with  N-l  links  that 
provides  a  communication  path  between  users  to  exchange 
information. 

functional  thread  -  a  construct  consisting  of  the  equipment  string  input,  equipment 

string  output,  a  description  of  the  transformations  to  be  performed, 
and  the  conditions  under  which  this  should  occur.  [INCOSE,  2000] 
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Figure  1.  Graphical  representation  of  an  equipment  string. 

Mathematically,  an  equipment  string  can  be  viewed  as  a  type  of  connected  graph  having  no 
cycles  called  a  tree.  Equipment  strings  are  trees  with  no  vertex  having  a  degree  greater  than  two. 
[Chartrand,  1977]  This  connection  to  graph  theory  may  open  equipment  string  analysis  to  a  host 
of  theorems  and  results  from  mathematics.  A  detailed  application  of  graph  theory  to  equipment 
strings  is  left  for  future  work.  A  paper  by  Coudert  and  Munoz  gives  a  flavor  of  how  graph 
theory  has  been  applied  to  similar  communications  engineering  problems.  [Coudert  &  Munoz, 
2001] 

The  functional  thread  definition  above  is  adapted  from  the  International  Council  on  Systems 
Engineering’s  (INCOSE)  definition  of  a  stimulus-condition-response  thread.  Conceptually,  an 
equipment  string  is  a  means  of  defining  a  communications  path  between  users  and  a  functional 
thread  is  a  way  of  defining  the  behavior  of  the  string  with  respect  to  how  information  is 
exchanged  and  used. 

Interoperability  failures  of  concern  here  deal  with  the  interaction  and  coupling  of  two  or  more 
systems.  The  following  are  provided  as  working  definitions  for  this  paper: 

Interoperability  -  “the  ability  of  two  or  more  systems  to  exchange  information 

and  to  mutually  use  the  information  that  has  been  exchanged.” 
[IEEE,  1988] 

Interoperability  fault  -  a  defect  or  condition  related  to  system  interaction  that  causes  a 

reproducible  malfunction  in  the  ability  of  two  or  more  systems 
to  exchange  information  and  to  mutually  use  the  information 
that  has  been  exchanged.  Note:  a  malfunction  is  considered 
reproducible  if  it  occurs  consistently  under  the  same 
circumstances.  [Adapted  from  FS-1037C,  1996] 

Interoperability  failure  -  the  inability,  due  to  an  interoperability  fault,  of  two  or  more 

systems  to  exchange  information  and  to  mutually  use  the 
information  that  has  been  exchanged. 
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These  definitions  are  not  limited  to  C4ISR  equipment  strings  and  functional  threads.  However, 
they  do  cover  the  central  issues  of  interoperability  at  the  end-to-end  thread  or  string  level 
discussed  above.  One  should  think  of  these  interoperability  definitions  as  being  associated  with 
the  interaction  of  systems  composed  of  specific  hardware  and  software  releases. 

The  wording  of  these  definitions  needs  to  be  precise  since  they  drive  the  problem  statement  that 
is  used  to  develop  the  theory  of  interoperability  failures  in  this  paper. 

3.  Why  a  theory  of  interoperability  failures? 

The  primary  purpose  of  any  theory  is  to  clarify  concepts  and  ideas 
that  have  become,  as  it  were,  confused  and  entangled.  Not  until 
terms  and  concepts  have  been  defined  can  one  hope  to  make  any 
progress  in  examining  the  question  clearly  and  simply  and  expect 
the  reader  to  share  one ’s  views. 


—  Carl  Von  Clausewitz 
On  War 

A  theory  of  interoperability  failures  is  needed  to  guide  the  collection  of  data  required  to  identify 
causal  indicators  and  build  statistical  prediction  rules  for  selecting  end-to-end  equipment  strings 
for  interoperability  testing.  Previous  efforts  to  achieve  a  sound  understanding  and  model  for 
end-to-end  interoperability  performance  have  been  hampered  by  a  lack  of  data  collection  and 
analysis  to  quantify  the  relationships  between  interoperability  performance  and  its  contributing 
factors.  This  has  primarily  been  due  to  the  large  number  of  potential  contributing  factors  that 
exist  to  be  studied. 

A  sound  working  theory  of  interoperability  failures  can  help  overcome  this  situation  by 
“informing”  and  guiding  the  experimental  designs  to  focus  on  the  contributing  factors  that  are 
most  closely  related  to  the  mechanisms  and  modes  suspected  to  be  responsible  for 
interoperability  faults  and  failures. 

The  reinforcing  relationship  between  theory  and  experiment  is  depicted  in  figure  2  where  a 
curious  observation  leads  to  the  construction  of  a  plausible  concept.  A  plausible  concept  is 
further  manipulated  through  the  logical  processes  of  abduction  to  provide  explanations,  induction 
to  provide  generalizations,  and  deduction  to  provide  consistency.  [Flach  &  Kakas,  2000] 

These  logical  processes  lead  to  hypotheses  and  models  which  inform  the  design  of  experiments 
where  measurements  are  made  to  collect  and  analyze  data  leading  to  interpreted  results.  These 
interpreted  results  serve  to  provide  evidence  for  or  against  the  theory  being  tested.  The  theory 
and  experiment  process  can  be  seen  as  one  form  of  the  scientific  method. 
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Figure  2.  An  idealized  depiction  of  the  reinforcing  relationship  between  theory  and 
experiment. 


There  are  several  examples  of  where  theory  has  guided  experiment  toward  new  knowledge.  The 
most  famous  example  is  that  of  Albert  Einstein’s  1905  Special  Theory  of  Relativity  which  used 
arguments  of  symmetry  and  invariance  to  transform  the  science  of  mechanics.  [Weinberg,  2001] 

A  theory  of  interoperability  failures  allows  one  to  narrow  the  focus  of  investigation  to  a  certain 
set  of  facts.  For  example,  suppose  it  is  believed  that  the  number  and  scale  of  hardware  or 
software  upgrades  in  one  system  relative  to  another  is  positively  correlated  to  the  likelihood  of 
an  interoperability  failure  occurring  between  these  systems.  In  this  case,  one  might  begin  to 
study  this  aspect  of  interoperability  failures  by  looking  at  older,  legacy  systems  to  see  what 
problems  have  occurred  when  other  systems  they  interoperate  with  have  been  more  rapidly 
upgraded. 

Obtaining  a  sound  theory  of  interoperability  failures  is  useful  beyond  guiding  the  data  collection 
required  to  build  a  prediction  rule  for  end-to-end  testing  of  C4ISR  equipment  strings.  Sutton 
states  that  “it  will  be  possible  to  determine  how  effective  directives,  strategies,  objectives,  plans, 
and  other  factors  are  in  improving  interoperability  [only  once  we  have]  a  theory  of 
interoperability  that  can  be  confirmed  by  repeatable  experiments  and  improved  through 
consistent  empirical  data  collection  and  analysis.”  [Sutton,  1999] 
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4.  Interoperability  over  time:  Extending  Sutton’s  Analogy 

Paul  Sutton’s  paper,  “Interoperability:  A  New  Paradigm,”  provides  a  refreshing  discussion  of 
interoperability  that  in  large  part  influenced  the  direction  of  the  work  in  this  paper.  His  critique 
of  the  shortcomings  of  the  Levels  of  Information  Systems  Interoperability  (LISI)  descriptive 
model  resonate  with  this  author’s  experience  in  trying  to  use  LISI  as  the  basis  of  a  diagnostic 
prediction  tool.  Specifically,  Sutton  identifies  five  “significant  deficiencies”  in  the  LISI  model: 

“First,  it  does  not  address  specific  electrical  interfaces,  which  are  necessary  for 
simple  connectivity  between  system  components,  a  necessary  but  insufficient 
condition  for  interoperability.  Second,  it  does  not  address  the  issue  of  compatible 
objects  and  object  models  ...  Third,  it  assigns  nominal  values  to  the  degree  of 
interoperability  between  two  different  systems  which  are  based  on  system 
documentation,  but  does  not  provide  any  objective  system  performance  measures 
that  are  based  on  actual  operation  of  the  systems.  Fourth,  its  method  of  assigning 
interoperability  scores  doesn’t  take  into  account  the  fact  that  some  systems  may 
not  need  to  connect  to  other  systems  at  higher  levels  of  interoperability  to  be 
considered  successful.  Finally,  it  does  not  explain  how  interoperability  can  be 
controlled,  changed,  or  improved.”  [Sutton,  1999] 

Sutton  draws  on  the  analogy  of  electronic  equipment  reliability  to  postulate  a  theory  of 
interoperability  failures  (Sutton  uses  the  term  interoperability  performance).  [Sutton,  1999] 
However,  his  assumptions  for  random  interoperability  failures  and  a  constant  interoperation 
failure  rate  can  be  seen  as  being  too  optimistic.  This  leads  to  a  model  that  is  too  simple  to  be 
useful  as  the  basis  of  a  prediction  rule.  Consequently,  this  theory  of  interoperability  failures 
lacks  sufficient  “relation-structure  to  the  process  it  models.”  [Johnson-Laird,  1983]  Thus  a  large 
list  of  potential  contributing  factors  emerges  leading  to  a  data  collection  and  analysis  effort  that 
is  expensive,  time  consuming,  and  likely  never  to  be  undertaken.  What  is  needed  is  a  theory  of 
interoperability  failures  that  simplifies  the  task  of  collecting  the  data  required  to  quantify  the 
relationships  between  interoperability  and  its  contributing  factors. 

By  challenging  the  assumption  of  a  constant  interoperability  failure  rate  and  carefully 
considering  the  failure  mechanisms  of  electronic  equipment,  it  is  possible  to  draw  a  more  useful 
analogy.  Consider  that  the  failure  rate  for  electronic  and  mechanical  equipment  as  they  age  over 
time  often  follows  a  life  distribution  model  in  the  shape  of  the  widely  known  “Bathtub”  curve. 
[NIST,  2003,  Section  8. 1 .2.4] 
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The  Bathtub  Curve 


Figure  3.  "Bathtub"  curve  life  distribution  model  for  the  failure 
rates  typically  found  for  electronic  and  mechanical  equipment. 
Source:  Adapted  from  NIST,  2003,  Section  8.I.2.4. 


To  fully  appreciate  the  development  of  this  analogy,  it  helps  to  review  some  details  of  the 
Bathtub  curve  as  it  applies  to  electronic  equipment  reliability.  This  life  distribution  has  three 
distinct  periods  where  specific  failure  mechanisms  tend  to  dominate.  The  first  period  is  called 
the  early  failure  period  or  infant  mortality  period  where  the  instantaneous  failure  rate  starts  out 
relatively  high  and  then  rapidly  decreases  over  a  time  frame  lasting  from  several  weeks  to  a  few 
months.  [NIST,  2003,  Section  8. 1.2.4]  The  failure  mechanisms  that  tend  to  dominate  in  this 
period  are  related  to  quality  and  can  be  attributed  to  variations  in  manufacturing  processes  and 
defects  in  materials  resulting  in  “weaker  parts”  that  cause  failures  during  normal  equipment 
operation.  [Fuqua,  1987,  p.  4.]  The  second  period  is  called  the  intrinsic  failure  period  which  is 
characterized  by  a  constant  instantaneous  failure  rate  where  most  equipment  spend  the  majority 
of  their  useful  lives.  Here  the  failure  mechanisms  are  related  to  quality,  stress,  or  wear.  They 
“are  random  in  nature  and  are  randomly  distributed  with  respect  to  time.”  [Fuqua,  1987,  p.4] 
The  third  period  is  called  the  wear  out  failure  period.  It  is  characterized  by  a  rising  instantaneous 
failure  rate  over  time.  The  failure  mechanism  here  tends  to  be  dominated  by  “accumulated 
damage  due  to  the  applied  stresses”  causing  parts  to  “become  weaker,  more  prone  to  failure,  and 
thus  fail  with  increasing  frequency.”  [Fuqua,  1987,  p.6]  These  three  periods  are  depicted 
graphically  in  figure  3. 

The  analogy  between  equipment  failures  and  interoperability  failures  can  be  extended  and 
improved  if  we  confine  our  analysis  to  the  interoperability  interaction  between  two  systems  over 
time  and  assume  the  resulting  model  can  be  extended  to  equipment  strings  using  a  pair-wise 
comparison  technique.  Also  to  be  considered  is  that  the  power  of  the  analogy  lies  in  the  fact  that 
different  failure  mechanisms  may  tend  to  dominate  at  different  times  in  the  history  of  the 
interaction  between  two  systems — not  that  there  are  exactly  three  time  periods  that  correspond  to 
the  bathtub  curve.  In  other  words,  one  should  not  expect  the  same  failure  mechanisms  found  in 
equipment  failures  to  apply  to  interoperability  failures  and  that  the  resulting  life  distribution 
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model  for  interoperability  failures  may  not  look  like  a  bathtub  curve.  Time  in  this  analogy  is  not 
the  time  since  a  piece  of  equipment  has  been  operating,  but  the  time  that  the  two  systems  under 
study  have  been  interoperating. 

However,  if  one  thinks  about  the  interoperability  interaction  between  two  systems  over  time,  a 
case  can  be  made  for  three  distinct  periods  of  time  where  different  sets  of  interoperability  failure 
mechanisms  tend  to  dominate.  These  three  postulated  time  periods:  early,  mediate,  and  relative 
obsolescence  are  described  below. 

First,  the  early  failure  period  is  postulated  to  start  with  a  relatively  higher  failure  rate  and 
decrease  over  time  because  the  two  systems  have  little  or  no  experience  interoperating  with  each 
other.  Each  system  was  created  with  an  intended  functionality  that  was  captured  in  a  set  of 
requirements  which,  in  turn,  was  translated  into  a  design  implementation.  This  design 
implementation  was  then  submitted  to  some  level  of  developmental  testing  to  verify  the  intended 
functionality.  Errors  and  inadequacies  in  this  development  process  will  not  become  evident  until 
the  two  systems  actually  begin  interacting  and  interoperating.  The  idea  of  using  the  development 
process  or  engineering  life  cycle  as  a  reasonable  approach  to  “deal  with  interoperability”  is 
echoed  in  the  writing  of  Hamilton,  Melear,  and  Endicott.  [Hamilton  et  al.,  2002b]  This  suggests 
an  experimental  approach  that  focuses  on  collecting  data  about  systems  that  have  recently  been 
introduced  into  service  to  understand  the  statistics  of  the  early  failure  period. 

Second,  after  the  early  failure  period,  a  mediate  failure  period  is  postulated  with  a  relatively  low 
failure  rate  because  the  two  systems  have  some  experience  and  a  history  of  interoperating  with 
each  other.  Here  the  intended  functionality  has  been  exercised  and  is  known  to  work.  This 
period  is  called  “mediate”  because  it  occupies  a  middle  position  between  the  early  failure  and 
relative  obsolescence  failure  periods.  However,  there  may  exist  rare  or  infrequently  used 
functional  threads  associated  with  equipment  strings  containing  the  two  systems  which  trigger 
latent  interoperability  faults  between  the  systems  (think  of  the  Year  2000  problem,  for  example). 
Additionally,  there  may  be  new  modes  of  operation  or  changes  in  tactics,  techniques,  and 
procedures  that  “stretch”  or  “stress”  the  intended  functionality  and  result  in  interoperability 
failures  (think  of  new  warfighting  experiments).  So  the  likelihood  of  interoperability  failures 
would  be  related  to  the  probability  of  these  events  occurring.  This  suggests  an  experimental 
approach  that  focuses  on  collecting  data  about  problems  resulting  from  systems  being  used  in 
new  operational  contexts  and  exercises.  Data  also  needs  to  be  collected  about  systems  that  have 
interoperated  for  some  time  without  a  great  difference  in  their  relative  upgrade  histories. 

Third,  a  relative  obsolescence  failure  period  is  postulated  with  a  relative  failure  rate  that 
increases  over  time  because  of  the  introduction  of  newer  software  and  hardware  upgrades  in  one 
system  relative  to  the  other.  The  greater  the  number  and  scale  of  these  upgrade  changes  the  more 
likely  interoperability  failures  are  to  occur,  (think  of  the  compatibility  of  software  written  for  an 
8088  microprocessor  with  a  Pentium  or  trying  to  use  files  from  an  early  version  of  WordPerfect 
with  the  latest  release  of  Microsoft  Word).  This  suggests  an  experimental  approach  that  focuses 
on  collecting  data  about  systems  that  have  been  in  service  for  many  years  and  looking  at  the 
number  of  interoperability  problems  as  a  function  of  the  separation  in  frequency  and  scale  of 
system  upgrades  over  time. 
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These  failure  periods  are  depicted  in  figure  4. 


0  Time  Two  Systems  have 

been  Interoperating 


Figure  4.  Postulated  life  distribution  model  for  theory  of 
interoperability  failures. 

Sections  6,  7,  and  8  discuss  each  time  period  in  more  detail.  For  each  time  period,  causal 
relationships  behind  the  failure  mechanisms  are  postulated  and  a  hypothesis  is  generated  to 
explain  the  nature  of  the  failure  mechanism  expected  to  dominate.  System  selection  criteria  are 
proposed  to  look  for  evidence  to  refute  or  lend  credence  to  the  theory. 

5.  Interoperability  and  complexity:  System  Interaction  and  Coupling 

There  is  more  to  the  interoperability  picture  than  a  life  distribution  model  that  captures  the 
interoperability  interaction  of  two  systems  over  time.  The  quality  or  nature  of  the 
interoperability  between  two  systems  can  also  influence  the  likelihood  of  interoperability 
failures.  Conceptually,  the  interoperability  between  two  critical  real-time  systems  requiring 
multiple  synchronous  transactions  to  complete  an  information  exchange  is  inherently  more 
complicated  than  the  interoperability  between  two  noncritical  low-speed  systems  requiring  only 
a  single  asynchronous  transaction  to  complete  an  information  exchange.  There  are  just  more 
opportunities  for  things  to  go  wrong  in  the  more  complicated  information  exchange  case. 

One  way  to  approach  interoperability  and  complexity  is  through  the  attributes  of  “interaction” 
and  “coupling.”  This  idea  can  be  traced  to  the  work  of  Charles  Perrow  in  his  book  Normal 
Accidents  and,  later,  to  John  Rushby  who  extended  these  ideas  to  computer  systems.  [Perrow, 
1984,  Chapter  3]  [Rushby,  1994] 
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Rushby  describes  these  attributes  as: 

Interaction,  which  can  range  from  “linear”  to  “complex,”  refers  to  the  extent  to 
which  the  behavior  of  one  component  in  a  system  can  affect  the  behavior  of 
other  components.  In  a  simple,  linear  system,  components  affect  only  those 
others  that  are  functionally  “downstream”  of  them;  in  a  more  complex  system,  a 
single  component  may  participate  in  many  different  sequences  of  interactions 
with  many  other  components.  In  computer  systems,  the  notion  of  “component” 
must  include  both  physical  and  abstract  entities;  for  example,  the  abstract  entity 
“database”  is  a  component,  as  are  its  processes  and  data,  and  also  the  devices  that 
provide  execution  and  storage.  Computer  systems  that  maintain  global  notions 
of  coordination  and  consistency  (e.g.,  distributed  databases)  are  considered  to 
have  complex  interactions,  since  activities  in  different  locations  interact  with 
each  other.  [Rushby,  1994,  Chapter  4,  p.  42] 

Coupling,  which  can  range  from  “loose”  to  “tight,”  refers  to  the  extent  to  which 
there  is  metaphorical  “slack”  or  “flexibility”  in  the  system.  Coupling  is  not  an 
independent  notion;  we  really  have  to  ask  “coupling  to  whaf!”  For  the 
preliminary  analysis  being  undertaken  here,  however,  we  can  tolerate  the 
imprecision  of  the  unqualified  term,  and  supply  more  specificity  when  needed. 

Loosely  coupled  systems  are  usually  less  time  constrained  that  tightly  coupled 
ones,  can  tolerate  things  being  done  in  different  sequences  than  those  expected, 
and  may  be  adaptable  to  different  assumptions  than  those  originally  considered. 

For  example,  craft  industries  are  usually  loosely  coupled,  whereas  production 
lines  with  just-in-time  inventory  control  are  tightly  coupled.  Viewed  as  a 
computer  system,  the  telephone  switching  network  may  be  considered  loosely 
coupled,  since  there  are  multiple  ways  to  route  calls,  whereas  most  hard-real- 
time  control  systems  are  tightly  coupled,  since  they  depend  on  everything 
behaving  as  expected.  [Rushby,  1994,  Chapter  4,  p.  42] 

These  ideas  of  interaction  and  coupling  will  be  used  in  shaping  the  system  selection  criteria  in 
the  following  three  sections  to  insure  that  the  quality  or  nature  of  the  interoperability  between 
two  systems  is  considered. 

6.  Early  Failure  Period 

The  early  failure  period  starts  when  two  systems  first  begin  interoperating  with  each  other.  What 
types  of  interoperability  failure  mechanisms  might  one  expect  to  see  dominating  in  this  early 
period? 

The  system  development  process  illustrated  in  figure  5  is  useful  to  understand  the  failure 
mechanisms  that  are  likely  to  dominate  in  the  early  failure  period.  The  process  starts  with  the 
intended  functionality  to  satisfy  a  need  or  provide  a  capability.  For  C4ISR  equipment  strings  this 
could  range  from  providing  confidentiality  to  prevent  eavesdropping  to  transmitting  real  time 
imagery  over  the  horizon.  Next,  this  intended  functionality  is  captured  in  a  set  of  requirements. 
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This  phase  of  the  development  process  is  a  logical  place  to  start  looking  for  sources  of 
interoperability  failures.  Are  the  intended  functionality  and  required  interoperability  captured  in 
the  requirements? 


Intended 

Functionality 


Developmental 

Design 

Testing 

Implementation 

Translated 1 
into 


Figure  5.  System  development  process  used  to  discuss  causal  relationships 
for  early  failure  period  failure  mechanisms. 

Root  causes  for  defects  associated  with  the  requirements  phase  include  missing  and  inadequate 
requirements.  Problems  manifested  from  missing  requirements  include  equipment  that  gets 
designed  and  built  without  key  interfaces,  protocols,  and  operating  modes.  Problems  manifested 
from  inadequate  requirements  include  weak  verification  criteria  that  cause  interoperability 
problems  to  escape  detection  during  developmental  testing  and  poorly  defined  technical 
performance  criteria  that  result  in  timing,  electrical,  frequency,  and  mechanical  mismatches. 

In  the  next  phase  of  the  development  process,  requirements  are  translated  into  a  design 
implementation.  This  phase  provides  another  place  to  look  for  sources  of  interoperability 
failures.  Does  the  design  provide  for  the  intended  functionality  and  required  interoperability 
captured  in  the  requirements? 

Root  causes  for  defects  associated  with  the  design  phase  include  waived  requirements  and  design 
flaws.  Problems  manifested  from  waived  requirements  include  those  listed  above  for  missing 
and  inadequate  requirements.  Problems  manifested  from  design  flaws  range  from  malfunctions 
stemming  from  mistakes  in  electrical  design  calculations  to  programming  errors  that  cause 
protocol  conflicts  and  other  undesirable  behavior. 

In  the  next  phase  of  the  development  process,  the  design  implementation  is  submitted  to 
developmental  testing.  In  this  phase,  sources  of  interoperability  failures  are  not  normally 
introduced  in  the  systems — instead  they  are  not  detected  and  screened  out  before  the  system  is 
introduced  into  an  operational  environment. 
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Root  causes  for  defects  associated  with  the  developmental  testing  phase  include  inadequate 
testing  to  verify  intended  functionality  in  the  design  implementation.  Problems  manifested  from 
inadequate  testing  include  those  associated  with  weak  verification  criteria.  This  can  lead  to 
failures  in  the  function  of  interfaces,  protocols,  and  operating  modes  or  failures  in  technical 
performance  often  seen  as  timing,  electrical,  frequency,  or  mechanical  mismatches. 

Ideally,  one  would  want  to  adopt  a  standard  terminology  to  describe  these  fault  classes.  A 
standard  terminology  makes  it  easier  to  make  comparative  analyses  and  draw  general 
conclusions.  Although  the  author  is  not  aware  of  research  community-agreed-upon  standard 
terminology,  Delores  Wallace  and  Richard  Kuhn  of  the  U.S.  National  Institute  of  Standards  and 
Technology  have  identified  several  fault  classes  based  on  their  study  of  medical  device  failures 
and  research  into  several  published  taxonomies.  Examples  of  the  fault  types  they  identify 
include  logic  faults  and  calculation  faults.  Since  much  of  interoperability  performance  and 
failures  are  driven  by  software,  it  seems  reasonable  to  follow  the  fault  analysis  conventions  and 
terminology  adopted  by  these  researchers.  [Wallace  &  Kuhn,  1999] 

This  approach  also  makes  it  easier  to  adopt  the  tools  and  techniques  they  have  developed  and 
made  available  to  the  research  community.  This  will  also  make  it  easier  to  compare  fault 
distributions  between  application  domains  they  have  studied  and  the  military  C4ISR  domain. 
[Wallace  et  al.,  1997] 

Based  on  the  preceding  discussion  of  the  causal  relationships  postulated  to  be  behind  failure 
mechanisms  in  the  early  period  the  following  hypothesis  is  generated: 


HYPOTHESIS:  Interoperability  mechanisms  expected  to  cause  most  failures  in  the  early  failure 
period  stem  from  1)  missing  or  inadequate  requirements,  2)  design  flaws,  and  3)  inadequate 
testing  of  the  new  system  being  introduced. 


At  this  point,  it  is  appropriate  to  consider  how  the  preceding  discussion  applies  to  Commercial- 
Off-The-Shelf  (COTS)  and  Nondevelopmental  items  (NDI).  These  items  undergo  the  same 
development  process  shown  in  figure  5.  In  this  case  the  development  process  is  geared  toward 
an  application  domain  such  as  a  commercial  office  environment  which  can  differ  greatly  from  a 
military  C4ISR  environment.  Therefore,  it  is  incumbent  upon  the  acquisition  agent  to  perform 
the  adequate  requirements,  trade-off,  and  modification  analyses  to  insure  suitability  for  the 
intended  military  application.  [SD-2,  1996] 

The  following  system  selection  criteria  and  rationale  are  offered  to  guide  the  selection  of  15 
C4ISR  systems  for  initial  study  to  look  for  evidence  to  relute  or  lend  credence  to  the  hypothesis 
and  the  early  failure  period  for  a  system  interoperability  life  distribution  model. 

1.  The  system  should  have  been  introduced  within  the  last  five  years.  The  rationale  is  the 
potential  greater  availability  of  data  and  information  about  the  system’s  introduction. 
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2.  The  system  should  be  new  representing  the  first  use  of  the  system  or  the  system  should  have 
undergone  a  major  upgrade.  The  rationale  is  to  capture  the  early  history  of  system 
interoperability  interactions. 

3.  The  systems  should  include  ones  that  are  tightly  and  loosely  coupled  and  ones  with  linear  and 
complex  interactions.  The  rationale  here  is  to  look  for  failure  dependencies  based  on  the 
quality  or  nature  of  interoperability  in  the  early  failure  period. 

7.  Mediate  Failure  Period 

The  mediate  failure  period  begins  after  the  early  failure  period  ends  and  extends  until  and  if  the 
relative  obsolescence  failure  period  begins.  What  types  of  interoperability  failure  mechanisms 
are  expected  to  dominate  in  the  mediate  failure  period? 

The  process  depicted  in  figure  6  is  used  to  understand  the  failure  mechanisms  that  are  likely  to 
dominate  in  the  mediate  failure  period.  In  this  period,  the  two  systems  have  a  history  of 
interoperating  and  the  majority  of  the  intended  functionality  of  the  systems  have  been  exercised 
and  made  to  work.  Here  two  types  of  circumstances  are  postulated  to  lead  to  interoperability 
failures;  those  arising  from  rare  functional  threads  that  trigger  latent  defects  and  those  arising 
from  new  modes  of  operation  and  procedures  that  lead  to  unintended  functionality  that  is  not 
provided  for  in  the  system  design  implementation. 


Figure  6.  Process  diagram  used  to  discuss  causal  relationships  for  the 
mediate  failure  period  failure  mechanisms. 
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Root  causes  for  defects  associated  with  rare  functional  threads  and  events  are  the  same  as  those 
identified  under  the  development  process  described  in  the  early  failure  period. 

Root  causes  for  defects  associated  with  new  modes  of  operation  and  procedures  are  similar  to 
those  arising  in  the  requirements  phase  of  the  development  process.  The  difference  here  is  that 
this  unintended  functionality  would  not  normally  be  captured  in  the  requirements. 

Based  on  the  preceding  discussion  of  the  causal  relationships  postulated  to  be  behind  failure 
mechanisms  in  the  mediate  failure  period  the  following  hypothesis  is  generated: 


HYPOTHESIS:  Interoperability  failure  mechanisms  expected  to  cause  most  of  the  failures  in  the 
mediate  failure  period  stem  from  1)  rarely  exercised  functional  threads  and  events  and  2)  new 
operational  modes  and  procedures. 


The  following  system  selection  criteria  and  rationale  are  offered  to  guide  the  selection  of  15 
C4ISR  systems  for  initial  study  to  look  for  evidence  to  refute  or  lend  credence  to  the  hypothesis 
and  the  mediate  failure  period  for  a  system  interoperability  life  distribution  model.  The  systems 
of  interest  here  should  have  been  used  in  warfighting  experimentation,  forward  looking 
exercises,  new  operational  contexts,  or  have  experienced  an  interoperability  failures  not 
attributable  to  the  early  or  relative  obsolescence  failure  periods. 

1.  The  system  should  have  been  interoperating  for  at  least  18  to  24  months  before  experiment, 
exercise,  or  interoperability  failure  occurs.  The  rationale  here  includes  getting  beyond  the 
early  failure  period. 

2.  The  systems  should  include  ones  that  are  tightly  and  loosely  coupled  and  ones  with  linear  and 
complex  interactions.  The  rationale  here  is  to  look  for  failure  dependencies  based  on  the 
quality  or  nature  of  interoperability  in  the  mediate  failure  period. 

8.  Relative  Obsolescence  Failure  Period 

The  relative  obsolescence  failure  period  occurs  if  and  when  the  number  and  scale  of  software 
and  hardware  upgrades  in  one  system  begin  to  outpace  those  of  another  system.  What  types  of 
interoperability  failure  mechanisms  might  one  expect  to  see  dominating  in  the  relative 
obsolescence  failure  period? 

The  process  depicted  in  figure  7  used  to  understand  the  failure  mechanisms  that  are  likely  to 
dominate  in  the  relative  obsolescence  failure  period.  They  are  the  same  process-related 
mechanisms  that  were  discussed  in  the  early  failure  period.  The  difference  here  is  that  these 
systems  have  a  history  of  interoperation  before  the  upgrade(s)  and  the  interoperability  failures 
will  most  likely  be  limited  to  those  functional  areas  most  affected  by  the  upgrades.  However, 
since  complex  systems  are  sometimes  involved,  unexpected  problems  could  appear  in  other  areas 
not  directly  involved  with  the  upgrades.  For  this  reason,  regression  testing  techniques  are  often 
used  to  help  check  for  defects  that  cause  problems  beyond  the  immediate  modules  being 
modified.  [McConnell,  1998,  pp.  215-219] 
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One  or  more  hardware  and/or  software 
upgrades  in  one  system  relative  to  another 
introducing  interoperability  faults 


Figure  7.  Process  diagram  used  to  discuss  causal  relationships  for  the 
relative  obsolescence  failure  period. 

Root  causes  for  defects  associated  with  upgrades  are  the  same  as  those  identified  under  the 
development  process  described  in  the  early  failure  period. 

Based  on  the  preceding  discussion  of  the  causal  relationships  postulated  to  be  behind  failure 
mechanisms  in  the  relative  obsolescence  failure  period  the  following  hypothesis  is  generated: 


HYPOTHESIS:  Interoperability  failure  mechanisms  expected  to  cause  most  of  the  failures  in  the 
relative  obsolescence  failure  period  stem  from  1)  inadequate  requirements,  2)  design  flaws,  and 
3)  inadequate  testing  when  one  system  is  upgraded  one  or  more  times  relative  to  the  other. 


The  following  system  selection  criteria  and  rationale  are  offered  to  guide  the  selection  of  15 
C4ISR  systems  for  initial  study  to  look  for  evidence  to  refute  or  lend  credence  to  the  hypothesis 
and  the  relative  obsolescence  failure  period  for  a  system  interoperability  life  distribution  model. 
The  systems  of  interest  here  should  be  “legacy”  systems  not  having  been  upgraded  in  several 
years. 

1 .  The  system  should  have  been  introduced  more  than  5  years  ago  and  should  not  have  received 
a  major  upgrade  in  the  last  3  years.  The  rationale  is  to  focus  on  older  systems  looking  for 
instances  where  other  interoperating  systems  have  been  upgraded. 

8th  ICCRTS  16  of  20  Approved  for  public  release:  03/13/03 


2.  The  systems  should  include  ones  that  are  tightly  and  loosely  coupled  and  ones  with  linear  and 
complex  interactions.  The  rationale  here  is  to  look  for  failure  dependencies  based  on  the 
quality  or  nature  of  interoperability  in  the  relative  obsolescence  period. 

9.  Creating  a  Prediction  Rule  for  Equipment  Strings  Based  on  the  Theory 

In  this  section,  a  brief  sketch  is  provided  of  the  general  approach  to  be  taken  once  some  initial 
data  has  been  collected  and  insights  into  the  interoperability  interaction  and  coupling  between 
two  systems  over  time  has  been  gained. 

The  idea  is  to  apply  the  “Science  of  Diagnostics”  to  extend  the  initial  insights  into  the 
interoperability  between  two  systems  over  time  based  on  the  data  collected  and  analyzed  for  45 
systems.  The  goal  is  to  develop  the  capability  to  predict  an  interoperability  failure  in  a  series  of 
connected  systems  forming  a  equipment  string.  This  will  entail  building  Statistical  Prediction 
Rules  (SPR)  to  be  used  to  make  binary  “yes”  or  “no”  decisions  about  performing  end-to-end 
interoperability  testing  on  an  equipment  string.  Achieving  a  statistically  significant  prediction 
rule  may  require  collecting  data  on  additional  systems.  [Swets  et  al.,  2000] 

The  accuracy  and  utility  of  the  diagnostic  protocol  based  on  the  SPR  will  be  characterized  using 
Relative  Operating  Characteristic  (ROC)  analysis.  ROC  analysis  provides  a  measure  of 
diagnostic  accuracy  that  is  independent  of  fault  event  frequencies  and  decision  criterion.  The 
interested  reader  is  referred  to  the  paper  by  Swets,  Dawes,  and  Monahan  for  a  more  complete 
discussion  of  these  techniques.  [Swets  et  al.,  2000] 

10.  Next  Steps. 

The  next  steps  for  this  research  initiative  include  collecting  the  data  required  to  refute  or  lend 
credence  to  the  theory,  refining  the  theory  based  on  the  collected  data,  determining  the 
relationships  between  the  causal  factors  and  interoperability  failures,  building  prediction  rules 
based  on  the  data,  extending  the  data  collection  to  a  statistically  significant  number  of  cases,  and 
building  statistical  prediction  rules. 

The  primary  data  collection  effort  will  focus  on  45  systems  with  15  systems  targeted  for  each 
time  period  in  the  postulated  life  distribution  model.  Initially,  one  to  three  systems  will  be 
investigated  to  discover  the  interview  and  data  collection  processes  that  are  the  most  effective  in 
uncovering  the  interoperability  history  of  a  system  and  addressing  the  experimental  objectives. 
Then  more  investigators  will  be  trained  and  begin  collecting  data  on  all  45  systems. 

It  is  envisioned  that  data  collection  will  follow  two  simultaneous  paths.  The  first  path  consists  of 
searches  of  known  reliability,  trouble  report,  and  issues  databases  for  relevant  information  on  the 
system  under  investigation.  The  second  path  consists  of  structured  interviews  with  the  system 
program  managers,  in-service  engineering  agents,  installers,  and  operators  to  gamer  anecdotal 
evidence,  stories,  and  leads  to  other  sources  of  relevant  knowledge. 
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11.  Summary 


A  theory  of  interoperability  failures  has  been  developed  and  presented  in  this  paper.  It  considers 
the  interaction  of  two  systems  over  time  to  postulate  a  life  distribution  model  with  three  distinct 
time  periods:  early,  mediate,  and  relative  obsolescence.  A  causal  analysis  that  focuses  on 
intended  functionality,  requirements,  design  implementation,  and  developmental  testing  was 
used  to  explain  the  existence  of  these  three  time  periods.  Interoperability  and  complexity  were 
discussed  in  terms  of  system  interaction  and  coupling.  Criteria  were  provided  for  selecting 
systems  to  study  in  hopes  of  refuting  or  lending  credence  to  the  theory.  An  approach  for 
extending  the  theory  to  create  statistical  prediction  rules  for  equipment  strings  was  also 
discussed. 
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•  Achieving  interoperability  among  C4ISR 
systems  remains  a  challenge  for  the  U.S. 

Department  of  Defense 

•  Progress  has  been  made  in  recent  years 
through  the  use  of: 

-  directives  and  guidance 

-  increased  awareness 

-  emphasis  on  capability  vice  platforms 

-  integrated  architectures 

-  mission  capability  packages 

•  However ... 
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One  element  missing  from  this 
mix  is  a  coherent,  verifiable 
theory  of  interoperability  failures 
that  captures  the  causes  of 
interoperability  faults  in  a  form 
that  practitioners  can  use  to  avoid 
problems  in  their  own  work 
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•  Purpose:  Develop  a  theory  of  interoperability 
failures  that  can  be  confirmed  through 
objective  evidence 

•  Goal:  To  be  able  to  efficiently  collect  the  data 
required  to  create  and  validate  prediction 
rules  that  can  be  used  to  make  diagnostic 
decisions  about  conducting  end-to-end 
interoperability  testing  of  C4ISR  equipment 
strings 
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•  “Interoperability:  A  New  Paradigm”  1999 
paper  by  Paul  Sutton  where  an  analogy  is 
drawn  between  interoperability  and  electronic 
equipment  reliability 

•  Two  papers  presented  in  at  last  year’s 
ICCRTS  &  CCRTS  by  John  Hamilton,  Pam 
Sanders,  CAPT  John  Melear,  and  George 
Endicott  where  interoperability  is  dealt  with 
using  an  engineering  life  cycle  model 

See  [Sutton,  1999]  [Hamilton  et  al.,  2002a]  [Hamilton  et  al.,  2002b] 
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U.S.  DoD  Definition 


“(1)  The  ability  of  the  systems ,  units,  or  forces  to 
provide  services  to  and  accept  services  from 
other  systems,  units,  or  forces,  and  to  use  the 
services  so  exchanged  to  enable  them  to 
operate  effectively  together,  and  (2)  the 
condition  achieved  among  communications- 
electronics  systems  or  items  of 
communications-electronics  equipment  when 
information  or  services  can  be  exchanged 
directly  and  satisfactorily  between  them  or  their 
users.  The  degree  of  interoperability  should  be 
defined  when  referring  to  specific  cases.  ” 


Source  [CJCS,  2000] 
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End-to-end  interoperability  -  “The  probability  of 
successful  interoperation  of  all  subscribers  in  a 
network  under  specified  conditions  for  a  given 
mission  time.” 

Interoperability  failure  -  “The  inability  of  the 
network  to  meet  specified  interoperability  levels, 
conditions,  and  requirements,  such  as  minimum 
acceptable  data  transfer  rate,  quality  of  service, 
and  maximum  allowable  latency.  ” 


Source  [Sutton,  1999] 
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Equipment  string  -  a  serial  sequence  of  N 
systems  connected  by  N-1  links  that  provides  a 
communications  path  between  users  to 
exchange  information 

Functional  thread  -  a  construct  consisting  of  the 
equipment  string  input,  equipment  string  output, 
a  description  of  the  transformations  to  be 
performed  and  the  conditions  under  which  this 
should  occur.  See  [INCOSE,  2000] 
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Interoperability  -  “The  ability  of  two  or  more 
systems  to  exchange  information  and  to 
mutually  use  the  information  that  has  been 
exchanged [IEEE,  1988] 

Interoperability  fault  -  A  defect  or  condition 
related  to  system  interaction  that  causes  a 
reproducible  malfunction  in  the  ability  of  two  or 
more  systems  to  exchange  information  and  use 
the  information  once  exchanged.  Note:  a 
malfunction  is  considered  reproducible  if  it 
occurs  consistently  under  the  same 
circumstances.  [Adapted  from  FS-1037C,  1996] 
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Interoperability  failure  -  “The  inability,  due  to  an 
interoperability  fault,  of  two  or  more  systems  to 
exchange  information  and  to  mutually  use  the 
information  once  exchanged.  ” 
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Why  do  we  need  a 

theory  of 
interoperability 
failures? 
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TO  ENGINEER 

IS  HUMAN 

The  Role  of  Failure  in  Successful  Design 


HENRY  PETROSKI 

Author  of  THE  EVOLUTION  OF  USEFUL  THINGS 


True  advances  in  engineering 
design  often  depend  on 
gaining  a  deeper 
understanding  of  how  things 
fail.  Think  of  19th  century 
steel  railroad  bridges  and  the 
de  Havilland  Comet  aircraft. 


Why  should  we  think  that 
designing  system  of  systems 
that  resist  interoperability 
failures  would  be  any 

different?  SPAWARSYSCEN  Charleston  Code  50E 
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INFORMS 


Test  Consistency 
Form  Model 
Form  Hypotheses 
Plausible  Concept 
Curious  Observation 


\ 


Start  Here 


THEORY  EXPERIMENT 


Provides  Evidence 
For/Against 


Design 
Sample 
Collect  Data 
Reduce  Data 
Analyze  Data 
Interpret  Results 
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Sutton’s  Analogy 


•  Interoperability:  A  New  Paradigm 

•  Draws  on  the  analogy  of  electronic  equipment 
reliability  to  postulate  a  theory  of 
interoperability  failures 

•  Assumes  random  interoperability  failures  and 
a  constant  interoperation  failure  rate 

•  Leads  to  a  large  list  of  potential  contributing 
factors  to  be  studied 


On  the  right  track  ...  But,  Challenge 

the  Assumptions! 


See  [Sutton,  1999] 
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[  lie  Ikithluh  Cune 


Source  [NIST,  2003] 
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•  Consider  interoperability  interaction  between 
two  systems  over  time 

•  Assume  resulting  model  can  be  applied  to 
equipment  strings  by  pair-wise  extension 

•  Power  of  analogy  is  that  “different  failure 
mechanisms  may  tend  to  dominate  at 
different  times ” 

•  Time  in  this  analogy  is  the  time  that  two 
systems  have  been  interoperating 


See  [Sutton,  1999] 
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•  Early  —  relatively  high  failure  rate;  the  two 
systems  have  little  or  no  experience 
interoperating  with  each  other. 

•  Mediate  —  relatively  low  failure  rate;  the  two 
systems  have  some  experience  and  a  history 
of  interoperating  with  each  other. 

•  Relative  obsolescence  —  relative  failure  rate 
that  increases  over  time;  occurs  when  one 
system’s  hardware  or  software  is  upgraded 
faster  than  the  other  system. 
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Mediate 
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Relative 

Obsolescence 
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Expected  Trends? 
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Critical  System  Properties: 

Survey  and  Taxonomy1 

Original  version  published  in  EeUabH&ff  Engineering  and  Sysdcm  Safely-  VoL  43. 

Ko.  2.  pp.  189-219, 1994 


John  Rushby 

Computer  Science  Laboratory 
SRI  Intemational 
Menlo  Park  CA  94025  USA 

Technical  Report  CSL- 93-01.  May  1993 
Revised  February  1994 

More  to  the  picture  than  a  life 
distribution  model  based  on  the 
time  two  systems  have  been 
interoperating  —  character  of 
system-to-system  interaction 
also  need  to  be  considered... 


See  [Perrow,  1984]  See  [Rushby,  1994] 
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Interaction  -  “ Ranges  from  “ linear ”  to 
“complex,  ”  refers  to  the  extent  to  which  the 
behavior  of  one  component  in  a  system  can 
affect  the  behavior  of  other  components.  ” 

Coupling,  -  “can  range  from  “loose”  to  “tight,” 
refers  to  the  extent  to  which  there  is 
metaphorical  “slack”  or  “ flexibility  ”  in  the 
system.  Loosely  coupled  systems  are  usually 
less  time  constrained  than  tightly  coupled  one, 
can  tolerate  things  being  done  in  different 
sequences  than  those  expected,  and  may  be 
adaptable  to  different  assumptions  than  those 
originally  considered.  ” 

Source  [Rushby,  1994,  Chapter  4,  p.  42] 
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Early  Failure  Period 


SHm/z 

V 

5ysfe/ns  Confer 
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Intended 

Functionality 


Developmental 

Design 

Testing 

Implementation 

Translated 
into 


Both  systems  go  through  this  process 
faults  can  be  introduced  in  first  three 


blocks  and  not  detected  in  the  last 
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Early  Failure  Period 
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V 
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Expected  causes: 

•  Missing  or  inadequate  requirements 

•  Design  flaws 

•  Inadequate  testing 

System  selction  criteria: 

•  System  introduced  in  last  5  years 

•  First  use  or  major  upgrade 

•  Mix  of  1)  tightly  and  loosely  coupled 
and  2)  linear  and  complex  interactions 
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Rare  Threads 

Operation  and 
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Procedures 

rrp 

t 

Unintended 

Latent  Defect 

Functionality 

_  / 

Captured  in 
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Mediate  Failure  Period 
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▼ 

C/rarWo/i 

Expected  causes: 

•  New  modes  of  operation  and  procedures 
leading  to  unintended  functionality 

•  Rare  threads  or  events  that  trigger  latent 
defects 

System  selction  criteria: 

•  Systems  interoperating  for  at  least  18  to  24 
months  before  experiment,  exercise,  or 
failure  occurance. 


Mix  of  1)  tightly  and  loosely  coupled  and  2) 
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One  or  more  hardware  and/or  software 
upgrades  in  one  system  relative  to 
another  introducing  interoperability 

faults 
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▼  Relative  Obsolescence  Failure  Period 

C/rarWo/i 

Expected  causes: 

•  Missing  or  inadequate  requirements 

•  Design  flaws 

•  Inadequate  testing 

System  selction  criteria: 

•  System  introduced  more  than  5  years  ago 

•  No  major  upgrades  in  last  3  years 

•  Mix  of  1)  tightly  and  loosely  coupled  and  2) 
linear  and  complex  interactions 
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Creating  a  Prediction  Rule  Based 

on  the  Theory 

•  First,  build  a  Statistical  Prediction  Rule  (SPR) 
to  make  binary  “yes”  or  “no”  decisions 
about  a  paricular  system-to-system  pair  will 
have  an  interoperability  failure 

•  Then,  extend  the  resulting  model  to 
equipment  strings  using  pair-wise  analysis 
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•  Statistical  analysis  is  used  to  quantify  the  power 
of  candidate  predictive  variables  to  discriminate 
between  positive  and  negative  instances  of  the 
diagnostic  alternatives  under  study 

•  Variables  may  be  added  to  a  SPR  and  assigned 
their  respective  weights  in  a  stepwise  fashion 

•  An  SPR  can  be  constructed  using  both  objective 
and  subjective  factors 

•  An  SPR  ends  up  being  a  set  of  variables  and 
weights 


From  [Swets  et  al.,  2000] 
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Consider  this  example 
to  understand  how 
Statistical  Prediction 
Rules  work.  Shown 
here  are  probability 
distributions  of  eye 
pressures  for  both 
healthy  people  and 
those  with  glaucoma. 

Establishing  a  decision 

threshold  of  30  for  diagnosing  patients  with  glaucoma 
results  in  an  accurate  diagnosis  of  about  50%  of  the 
diseased  population,  P(True  Positive),  while  about  10%  of 
the  healthy  population  will  be  mis-diagnosed  with  the 
disease  P(False  Positive)  or  false  alarms. 

From  [Swets  et  al.,  2000] 
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▼  Receiver  Operating  Characteristic 
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A  Receiver  Operating 
Characteristic  (ROC) 
curve  is  created  by 
plotting  the  areas  under 
the  distributions  for  each 
possible  threshold  value. 

For  example,  a  threshold 
of  30  corresponds  to  the 
point  where  P(FP)  x-axis  = 

0.1  and  P(TP)  y-axis  =  0.5. 

This  represents  an  approx 
threshold  of  S  =  2.  The 
diagonal  line  represents 
“chance”  accuracy  of 
50/50  ratio  T rue  Positive 
to  False  Positive. 
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▼  Where  are  these  techniques  being  used? 
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•  Numerous  fields  including  medical  diagnostics, 
predicting  violence  among  criminals,  weather 
forecasting,  law  school  admissions,  aircraft 
cockpit  warnings,  qualility  of  sound  in  opera 
houses,  and  predicting  wine  vintage  quality. 

•  The  following  example  is  taken  from  the  field  of 
medical  diagnosis  where  several  different  pieces 
of  information  are  combined  to  judge  whether 
prostrate  cancer  has  spread  in  a  patient... 


From  [Swets  et  al.,  2000] 
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SPR  for  Prostrate  Cancer 
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Empirical  Receiver 
Operating 

Characteristic  (ROC) 
curves  for 
determining  the 
extent  of  prostrate 
cancer,  based  on 
SPRs  (Statistical 
Prediction  Rules), 
using  one,  two,  three, 
or  four  predictor 
variables.  The  closer 
to  the  upper  left,  the 
higher  the  SPR’s 
accuracy. 


From  [Swets  et  al.,  2000] 
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i  j 


t _ t 

t _ 


1 


Air  Defense  System  Integrator  (ADSI) 


Protocol  Translator 
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Next  Steps 
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•  Refine  initial  system  selection  criteria 

•  Collect  and  analyze  data  on  initial  systems  to 
be  studied 

•  Investigate  establishing  a  center  for  studying 
interoperability  failures  at  U.S.  JFCOM,  J8, 
Joint  Interoperability  and  Integration  (Jl&l) 

•  Leverage  NIST  efforts  and  tools.  (Error,  Fault 
and  Failure  data  collection  and  analysis  tool) 

•  Foster  a  continuing  dialog  through  this  forum 
and  others 
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Summary 
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•  A  theory  of  interoperability  failures  has 
been  developed 

•  It  considers  the  interaction  of  two 
systems  over  time 

•  Postulates  three  distinct  time  periods: 

-  Early 

-  Mediate 

-  Relative  obsolescence 


Need  to  study  some  representative 
systems  to  refute  or  lend  credence  to 
the  theory 
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Questions? 
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Backups 
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1.  Does  not  address  specific  electrical  interfaces 

2.  Does  not  address  objects  and  object  model 
compatibility 

3.  Assigns  nominal  values  based  on 
documentation,  not  objective  system 
performance 

4.  Does  not  take  into  account  that  some  systems 
may  not  need  higher  levels  of  interoperability 
to  be  considered  successful 

5.  Does  not  explain  how  interoperability  can  be 
controlled,  changed,  or  improved 


Source  [Sutton,  1999] 
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